Method of detecting objects

ABSTRACT

The invention, at the first frame, records the background, as the threshold of detection—constant image with the predetermining minimum threshold, and with each subsequent frame, corrects the background and threshold with formulas while producing the adaptation of a renewal—constant in each pixel depending on the presence of the detected object, defines the difference between the current frame and background, compares it with the threshold, combines elements exceeding a threshold into detection zones, performs rejection of the detection zones, divides the zones in order to separate shadows, forms a tracking zone, searches already-detected segment of objects, forming clusters of the tracking zones. The coordinates of the obtained rectangles are assumed as the coordinates of the objects located in the frame.

INCORPORATION BY REFERENCE

The present application claims priority from Russian application 2008119711 filed on May 19, 2008, the content of which is hereby incorporated by reference into this application.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to the field of optoelectronic systems, information processing and is suitable to be used in the composition of security systems and traffic analysis in complex interference caused by regular and temporary changes in lighting, motion of foliage and water, collapse by rain, snow, etc.

2. Description of the Related Art

There is a way to detect moving vehicles (Dailey D. J. 1999) including obtaining frames, computation of the difference between frames, binarization with the threshold, morphological operations, calculation of the Sobel operator for determining the boundaries of object.

The disadvantages of this method are low tolerance of noises, and consequently low accuracy of detection as a result of the conspicuous influence of the changes in the region of working scene, due to the time of day, weather conditions, and appearance of new stationary objects in the zone of monitoring.

Most relatives of the technical essence is a way to detect moving vehicles [Patent Rf. U.S. Pat. No.2,262,661 (Eremin S. N. 2005)]. It includes obtaining frames, computation of the difference between frames, binarization with the threshold, morphological operations, calculation of the Sobel operator, storing the first frame, correction of background according to the specific formula, the definition of the difference between the frame and background, obtaining the histogram of the image, finding a maximum of brightness, verifying the presence of objects, separation the intermingling objects, forming rectangles which describes the position of vehicles and their coordinates assumed as the vehicles locate in the frame means.

The disadvantages of this method are: the false detection of shadows as the objects—vehicles, the inability to determine the real size of detected objects. Another disadvantage of the method is that in the case of false detection of the objects or the location of objects which was brought and left behind in the scene, renovation of the background model in appropriate pixels ceases completely, leading to the impossibility of automatic consolidation of new static objects into background. Thus, this method possesses the insufficiently high quality of the determination of vehicles.

SUMMARY OF THE INVENTION

It is an object of some aspects of the present invention to expand the functionality and improve the quality of television monitoring security systems under the complex climatic conditions and with a dynamic background by reducing the number of false responses and improve the accuracy of determining the boundaries of moving objects.

We propose a method of detecting objects. In a aspect of the invention, a technique includes the following steps/operations.

1) Obtaining frame.

2) establishing the background with its subsequent correction which is achieved with the aid of some constants of the renovation of background ρ, which in turn are chosen in each pixel depending on the detection of the object by the rule:

$\begin{matrix} {\rho = \left\{ \begin{matrix} {\rho_{1},} & {{if}\mspace{14mu} {pixel}\mspace{14mu} {classified}\mspace{14mu} {as}\mspace{14mu} a\mspace{14mu} {background}} \\ {{k*\rho_{1}},} & {{if}\mspace{14mu} {pixel}\mspace{14mu} {classified}\mspace{14mu} {as}\mspace{14mu} {an}\mspace{14mu} {object}} \end{matrix} \right.} & (1) \end{matrix}$

where 0<ρ₁<1, k—factor, 0<=k<=1.

3) Calculation absolute difference between the current frame and the background.

4) Binarization with threshold frame, the threshold values of pixels are calculated by the formula p_(i)=k₁ ²σ_(i) ², where k₁—coefficient, σ_(i)—calculated by the formula of moving average σ_(i) ²=(1−ρ)σ_(i−1) ²+ρ(μ_(i)−I_(i−1))², where I_(i−1)—the previous frame, μ_(i)—current background image. During the binarization the rule is used:

$\begin{matrix} {r = \left\{ \begin{matrix} {255,} & {{{if}\mspace{14mu} \left( {I_{i} - \mu_{i}} \right)^{2}} > p_{i}} \\ {0,} & {otherwise} \end{matrix} \right.} & (2) \end{matrix}$

where I_(i)—the current frame.

5) Performing spatial filtering through the creation of the zones of pre-detection in the binarized frame.

6) Elimination the pre-detection zones which describe (identify) the position of objects, and dividing off the remains of elimination into the sections.

7) Forming tracking zones which is independent parts of objects.

8) Elimination of some tracking zones and uniting the remained zones into clusters. After that processing the elimination of clusters. The elimination of zones and clusters is performed taking into account of their metric sizes and coordinates. The coordinates of the remained clusters assume as the coordinates of the objects.

The renovation (updation) of the background frame with a small constant of renewal whose value is chosen by the rule (1) enables to automatically include objects which are located in the detection zone for a long time in a steady state, into the background frame. Implementation of spatial filtering in a special manner with the multilevel elimination of the corresponding zones and clusters according to their occupancy, metric sizes and coordinates enables to reduce the number of false responses, cut off the shadows whom objects throw in the detection zone, thereby improving the quality and accuracy of the method. Determination of the metric sizes of objects enables to analyze trajectories of motion of the objects, that extends the functionality of the system—implementating of the method. Thus distinctive features are essential and necessary for the solution of the problem presented.

The other objects and methods of achieving the objects will be readily understood in conjunction with the description of embodiments of the present invention and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows rectangles indicating the zones of pre-detection.

FIG. 2 shows the dividing the zones of pre-detection into sections of equal width.

FIG. 3 shows the creating a new tracking zone from the sections.

FIG. 4 shows the assigning a section to the tracking zone.

DETAILED DESCRIPTION OF EMBODIMENTS

An apparatus implemented in an embodiment of the invention, contains at least one video camera, a terminal of surveillance adapted to inform the operator and obtaining control command from him, and at least one computer with the memory, connected by coaxial cable or local area network.

To implement the method of the embodiment, for example, the sequence of frames from a stationary television, as color or black and white can be used. Computer forms an array of items for each frame in memory. Every pixel of image is stored in an array of brightness value, for example, separately for red, green and blue channels (RGB-submission), or separately through YUV, either on a single channel brightness (L. Shapiro 2006 (L. Shapiro, J. Stokman. Computer vision. Moscow: BINOM. Knowledge Lab, 2006), 249-259).

Before starting the work it is necessary to set initial parameters in the device, realized, for example, using operator workplace:

parameters of the camera calibration (the focal length, sight angle, angle around the axis, the height of installing cameras, the size of the sensor, the sensor permission—these parameters can be known from direct measurements, camera specification data, either automatically determined using the known methods of calibration (L. Shapiro 2006, 567-578);

width w and height h sections and zones escorts, minimum and maximum width (W_(CMin), W_(CMax)) and height (H_(CMin), H_(CMax)) facility, set the coordinates of ignoring areas—parts of images in which found objects will be discarded;

updating constant ρ₁ coefficient of changes of renewal constant k, the threshold factor k₁;

coefficient of similarity with the background C_(Bkg), coefficient of overlapping sections and tracking zones, the proximity factor with the background, and a degree of similarity for searching of tracking zones from previous frames C_(TrCorr);

the number of frames, during which time tracking zone will remain until it is not found in the current frame, the distance between the tracking zones for the formation of clusters (for example, in pixels), the threshold life time of cluster LT.

width and height of tracking zone W_(Merge), H_(Merge);

percent of overlapping to assign sections to tracking zone C_(Ovr) and percentage of area ratio for eliminating of tracking zones C_(ARatio).

Then the device gets the first frame and conduct initialization of background frame, for example, with zero values or, for reducing the time to start operating mode, with the first frame. The device also produces initialization of threshold frame with constant value, for example, 255, if each pixel threshold frame corresponds to 1 byte.

Then the device gets the next frame for each new frame complete:

1) updating constant of the background is determined for each pixel, depending on the presence of the detected objects on the rule (1). Renewal constant is chosen, for example, as ρ₁=0001, and factor k=0,1. Renewal constant is chosen so that objects to be detected, should not merge with the background frame, but background fluctuations (grass, water, etc.) and slowly moving shadows should be filtered. The coefficient is chosen corresponding to the desired time of detection of stopped objects τ, for example k˜5/(ρ₁ τ)

2) produce updating of background frame and of the standard deviation by the formula of exponential moving average (Asely 1995, 184-192):

μ_(i)=(1−ρ)μ_(i−1) +ρI _(i−1)   (3)

σ_(i) ²=(1−ρ)σ_(i−1) ²+ρ(μ_(i) −I _(i−1))²   (4)

where I_(i)—the current frame, I_(i−1)—previous frame, μ_(i)—the current background frame, μ_(i−1)—previous background frame, σ_(i)—current value of the mean deviation, σ_(i−1) ²—previous value of the mean deviation, i—the number of current frame.

3) calculate the absolute difference between the current scenes and background scenes in each pixel (L. Shapiro 2006, 329-331), that is shaping frame difference D:

D _(i) =|I _(i)−μ_(i)|  (5)

4) form the threshold frame by the formula:

p_(i)=k₁σ_(i),   (6)

where k₁=3 . . . 5—a constant factor. Greater value is chosen for scenes with more intense noise, or less—for static scenes.

5) produce binarization—comparison of with the threshold frame and difference frame by rule (2) with the formation of a binary frame r;

6) when using color pictures color channels unite into one. For example, in the case of pixel values in the form of red, green and blue colors (RGB—presentation (L. Shapiro 2006, 249-252)), an association produced by rule:

m=r^(R) OR r^(G) OR r^(B),   (7)

where m—the resulting value, r^(R), r^(G) and r^(B)—the values in, respectively, blue, green and red channels, “OR”—boolean “OR” function, such that if the value of any of the arguments is different from zero, the result is equal to 1 otherwise 0.

Spatial filtering is performed as follows:

7) mark all connected areas of nonzero value in the binary frame using any known way (L. Shapiro 2006, 84-92) and create from these areas the zones of prior detection as circumscribing the rectangles, which bounding coordinates are the coordinates for the zones. Rectangles are built in a well-known way, choosing the extreme right, top, bottom and left point of each area and holding through them, respectively, right, top, bottom and left side of the rectangle (FIG. 1);

8) calculate metric sizes of prior detection zones using, for example, calibrated cameras, and produce elimination (culling) with width w_(min), w_(max), altitude h_(min), h_(max) of clusters.

9) each zone of prior detection in binary frame is divided into sections of equal width (FIG. 2). Width w is defined at the setting stage, based on the alleged size of the detected objects, for example, for a person or a car, you can choose the width w=0.2 m. The width of section is clarified in pixel unit so that the zone could be divided into an integer number of sections. They calculate this integer number of sections of the width of w, which could be deployed inside the zone, and then divide the width of the zone at this number, receiving the required width of the section.

10) New tracking zones are created (FIG. 3). For this at first for the section which is situated closest to the central point of the bottom frame, a tracking zone of predefined metric height h and metric width w is created (for example, h=0.8 m—for a man so that he will be discovered, even if half of him is hidden by bushes, merged with the background or not detected for other reasons). Then, for each section the area of overlapping with the tracking zone (FIG. 4) is calculated. If the ratio of this square to the square of section exceeds the specified threshold, for example, 60%, the section is assigned to tracking zone, repeat the procedure until remaining unprocessed sections.

11) For each tracking zone the total area of sections assigned to it and the total area of overlapping are calculated. Tracking zones, in which the ratio of total area of overlapping to the total area of sections exceeds the specified threshold, for example, 50%, are believed reliable, and all sections assigned thereto are excluded from further processing, otherwise tracking zone is rejected.

12) Perform comparison of tracking zones with the background frame, for instance, by calculating the correlation function, the zones for which the value of the function exceeds the specified threshold correlation (60%), are sorted out, Execute searching (for example, the correlation) of the tracking zones formed at the previous frames in the current frame. Zones in which the value of a collation function (correlation) exceeds the specified threshold, for example, 70% (i.e. matched) during any displacement, are added to the list of new tracking zones. Zone, for which there is no correspondence more than e.g. N_(miss)=5 in successive frames, is rejected (L. Shapiro 2006, 219-225);

13) unite the new tracking zones which are closely located into clusters. For example, tracking zones, the distance between the boundaries of which less than a given, for example, 5 pixels or 10 centimeters;

14) calculate metric sizes of clusters using, for example, calibrated cameras, and produce rejecting accounting width W_(CMin), W_(CMax), and altitude H_(CMin), H_(CMax) of clusters and due to their position relative to defined ignoring zones (L. Shapiro 2006, 578-580). For example, if the lower mid-point of the cluster (“legs”) gets inside the defined zone of ignoring, such a cluster should be rejected;

15) the lifetime of the cluster is defined as the number of frames, during which cluster reveals itself. For this purpose, produce a comparison of the remaining clusters with clusters saved in previous frames. For clusters, in which the pair having close coordinates and size were found, write down value of the frame counter, increasing by one while a cluster is found in old clusters. Clusters, frame counter of which is less than a specified threshold, for example, 40, are retained, but they are excluded from further processing at the current frame. The threshold is chosen experimentally, in a way to prevent short-lived objects from falsely detection. For clusters that are not match with a pair from previous frame, establish the value of life-time=1, which gives the initial value for subsequent frames.

16) coordinates of clusters, which successfully passes rejecting, accepted as the coordinates of objects in the frame.

An example of concrete embodiment is described below.

One takes a sequence of black-and-white television frames from video surveillance cameras and gives them one after another at the entrance device that implements a method. Using calibration methods, determine the following calibration parameters by one of the frames and camera nameplate parameters: the sine and cosine of camera tilting angle sin (α)=0.997564, cos (α)=0.069756, mounting height of the camera H=4 m, cosine and sine of camera turning angle sin (β)=0, cos (β)=1, metric size pixel W_(px)=0.0003, H_(px)=0.0003, the camera focal length f=0.24, the height of the horizon lines on the frame H_(hor)=55.

Define the constant of updating ρ₁=0005, the rate k=0.2, and the rate of k₁=3. Define restrictions on the size of the prior detection zones in meters: W_(min)=0.1, W_(max)=2, h_(min)=0.1, h_(max)=3.

Define a threshold of similarity with the background C_(Bkg)=70%, the rate of expansion of the tracking zone while creating the search zone in width C_(TrWExp)=1 and high C_(TrHExp)=1, i.e. determine that the search will take place in the area obtained by the expansion of the tracking zone as half-width toward right and left respectively, likewise in the height direction.

Define the degree of similarity for search of tracking zones from previous frames C_(TrCorr)=60%.

Define fill rate of tracking zones C_(TrFill)=30%.

Define rate of zones overlap C_(ZOver)=95%.

Define a number of frames, during which time zone will maintain support, if it is not found in the current frame, N_(mf)=1.

Define the distance between the tracking zones for the formation of clusters CF=5 pixels.

Define the cluster lifetime threshold LT=40 frames.

Define permissible cluster width and height W_(CMin)=0, W_(CMin)=2 m, H_(CMin)=0, H_(CMax)=4 m.

Define width and height of tracking zone W_(Merge)=0.2 m, H_(Merge)=0.8 m.

Define the percentage of overlapping to assign section to the tracking zones C_(Over)=10% and the percentage of space for rejection of tracking zones C_(ARatio)=50%.

Take first frame with values I₁ ^(0,0,R)=4, I₁ ^(0,0,G)=0, I₁ ^(0,2,B)=0, . . . , I₁ ^(319,239,R)=176 and use it to initialize the background frame B₁=I₁. Here and further the upper indices match pixel coordinates in a frame; first index—column, second—a row, third—R, G or B—color channel.

Set pixels values of threshold frame equal to 255 in each color Channel: p₁ ^(0,0,R)=255, p₁ ^(0,1,R)=255, p₁ ^(0,2,B)=255, . . . , p₁ ^(319,239,B)=255.

take a second frame with values I₂ ^(0,0,R)=6, I₂ ^(0,0,G)=0, I₂ ^(0,2,B)=6, . . . , I₂ ^(319,239,B)=178. Execute updating of background and threshold frames with formulas (3, 4, 6). Obtain B₂ ^(0,0,B)=4, B₂ ^(0,0,G)=0, B₂ ^(0,0,B)=0, . . . , B₂ ^(319,239,B)=176, p₂ ^(0,0,R)=255, p₂ ^(0,0,R)=255, p₂ ^(0,0,B)=255, . . . , p₂ ^(319,239,B)=255.

Calculate difference frame D with formula (5) and obtain D₂ ^(0,0,R)=2, D₂ ^(0,0,G)=0, D₂ ^(0,0,B)=6, . . . , D₂ ^(319,239,B)=2,

Perform its binarization using threshold frame: γ₂ ^(0,0,R)=0, γ₂ ^(0,0,G)=0, γ₂ ^(0,0,B)=0, . . . , γ₂ ^(319,239,B)=0.

Merge color channels based on OR-rule and obtain: m₂ ^(0,0)=0, m_(s) ^(1,0)=0, m₂ ^(2,0)=0, . . . , m₂ ^(219,239)=0,

Label connected areas (with pixel value of 255) and receive the number of areas=0.

Further processing is not executed for the second frame.

For frames from the third to 9750, execute similar actions.

Take the 9751st frame with values I₉₇₅₁ ^(0,0,R)=2, I₉₇₅₁ ^(0,0,G)=0, I₉₇₅₁ ^(0,0,B)=5, I₉₇₅₁ ^(1,0,R)=0, I₉₇₅₁ ^(1,0,G)=0, I₉₇₅₁ ^(1,0,B)=1, I₉₇₅₁ ^(2,0,R)=4, I₉₇₅₁ ^(2,0,G)=3, I₉₇₅₁ ^(2,0,B)=5, . . . , I₉₇₅₁ ^(319,239,B)=177.

Perform updating of the background and threshold frames based on the formulas (3), (4) and (6).

Obtain B₉₇₅₁ ^(0,0,R)=2, B₉₇₅₁ ^(0,0,G)=0, B₉₇₅₁ ^(0,0,B)=2, B₉₇₅₁ ^(1,0,R)=2, B₉₇₅₁ ^(1,0,G)=0, B₉₇₅₁ ^(1,0,B)=2, B₉₇₅₁ ^(2,0,R)=8, B₉₇₅₁ ^(2,0,G)=5, B₉₇₅₁ ^(2,0,B)=9, . . . , B₉₇₅₁ ^(319,239,B)=176, p₉₇₅₁ ^(0,0,R)=6, p₉₇₅₁ ^(0,0,G)=6, p₉₇₅₁ ^(0,0,B)=6, p₉₇₅₁ ^(1,0,R)=7, p₉₇₅₁ ^(1,0,G)=6, p₉₇₅₁ ^(1,0,B)=6, p₉₇₅₁ ^(2,0,R)=5, p₉₇₅₁ ^(2,0,G)=9, p₉₇₅₁ ^(2,0,B)=11, . . . , p₉₇₅₁ ^(319,239,B)=6.

Calculate difference frame D based on formula (5) and receive D₉₇₅₁ ^(0,0,R)=6, D₉₇₅₁ ^(0,0,G)=0, D₉₇₅₁ ^(0,0,B)=3, D₉₇₅₁ ^(1,0,R)=2, D₉₇₅₁ ^(1,0,G)=0, D₉₇₅₁ ^(1,0,B)=1, D₉₇₅₁ ^(2,0,R)=6, D₉₇₅₁ ^(2,0,G)=2, D₉₇₅₁ ^(2,0,B)=4, . . . , D₉₇₅₁ ^(319,239,B)=1.

Perform its binarization using threshold frame: γ₉₇₅₁ ^(0,0,R)=0, γ₉₇₅₁ ^(0,0,G)=0, γ₉₇₅₁ ^(0,0,B)=0, γ₉₇₅₁ ^(1,0,R)=0, γ₉₇₅₁ ^(1,0,G)=0, γ₉₇₅₁ ^(1,0,B)=0, of γ₉₇₅₁ ^(2,0,R)=0 γ₉₇₅₁ ^(2,0,G)=0, γ₉₇₅₁ ^(2,0,B)=0, . . . , γ₉₇₅₁ ^(319,239,B)=0.

Merge color channels based on OR-rule and receive: m₉₇₅₁ ^(0,0)=0, m₉₇₅₁ ^(1,0)=0, m₉₇₅₁ ^(2,0)=255, m₉₇₅₁ ^(3,0)=0, m₉₇₅₁ ^(4,0)=0, m₉₇₅₁ ^(5,0)=0, m₉₇₅₁ ^(6,0)=255, m₉₇₅₁ ^(7,0)=0, m₉₇₅₁ ^(8,0)=0, . . . , m₉₇₅₁ ^(319,239)=0.

Label connected areas and receiving the number of areas=928.

Create pre-detection zones by generating circumscribing rectangles: Dz₀={62, 14, 62, 15}, . . . , Dz₉₂₇={200,238,203,239}, where the coordinates can be found in the following order: (left border horizontally, the upper limit of vertically, horizontally right border, the lower limit of vertical).

Calculate the size of each zone in meters and receive: width of Dz₀=1.38 m, height of Dz₀=2.66 m, . . . , width of Dz₉₂₇=0.11 m, height of Dz₉₂₇=0.10 m.

Filter out the pre-detection zone due to the metric size and obtain 119 zones.

Pre-detection zones in binary frame are divided into sections. Receive coordinates of 149 sections: Sz₀=(11, 14, 11, 14}, . . . , Sz₉₂₇=(200,238,203,239).

Form new tracing zones from the sections. The first position of section is chosen from the closest one to the bottom central point of frame, that is, section with coordinates (118, 128, 121, 163), and creating a tracing zone satisfying predefined metric height of 0.8 m and width of 0.2 m. Obtain tracing zone of coordinates (117, 150, 121, 163). Relevant section is excluded from further processing.

Then, for each of the remaining sections calculate the area of overlapping with the tracking zone. For the section with coordinates (113, 126, 117, 165), obtain the square of overlapping 14. The square of section is 169. Because of the ratio of this area to the area of section does not exceed the specified threshold of 10%, the section is not added to the tracking zones. Repeat procedure for the remaining sections.

The procedure for forming tracking zone is repeated as long as remaining non-applied (non-included) sections.

Calculate the ratio of the summary area of the sections attached to each zone of tracking, to the summary area of the overlap of sections with the tracking zone. For the tracking zone with the coordinates {1, 18, 1, 18} value 0, 97 will be obtained. Since this value is more than the specified threshold, the tracking zone is considered reliable.

It turns out that in the current frame there is a tracking zone with the coordinates {1, 18, 1, 18}.

Unite the tracking zones into one cluster, if the distance between their boundaries is less than 5 pixels. In this case, form one cluster C1₀{1, 18, 1, 18}.

Calculate metric size of the cluster, using the camera calibration. Get the width of cluster 0, 83 m, a height—0.81 m.

Perform rejection based on the width Of W_(CMin), M_(CMax), and the height Of H_(CMin), H_(CMax) of the clusters.

With Clusters passed in the rejection, continues further processing.

Since any zone of ignoring is not defined, screening on of the position of them does not process.

Calculate lifetime of cluster as the number of frames of detecting for each current cluster. To do this, compare the remaining clusters with the clusters stored in the previous frames. Lifetime of cluster is defined as 1 because no cluster has been previously detected.

Since that value does not exceed defined threshold 40, processing is finished for the current frame, but the cluster is stored to be processed at the further frames.

Perform similar calculations for each of the next frame up to the 9819th frame.

Take 9820th frame with values I₉₈₂₀ ^(0,0,R)=3, I₉₈₂₀ ^(0,0,G)=2, I₉₈₂₀ ^(0,0,B)=0, I₉₈₂₀ ^(1,0,R)=2, I₉₈₂₀ ^(1,0,G)=1, I₉₈₂₀ ^(1,0,B)=0, I₉₈₂₀ ^(2,0,R)=9, I₉₈₂₀ ^(2,0,G)=8, I₉₈₂₀ ^(2,0,B)=13, . . . , I₉₈₂₀ ^(319,239,B)=176.

Perform updating the background and threshold frames based on the formulas (3), (4) and (6).

Receive B₉₈₂₀ ^(0,0,R)=2, B₉₈₂₀ ^(0,0,G)=0, B₉₈₂₀ ^(0,0,B)=2, B₉₈₂₀ ^(1,0,R)=2, B₉₈₂₀ ^(1,0,G)=0, B₉₈₂₀ ^(1,0,B)=2, B₉₈₂₀ ^(2,0,R)=8, B₉₈₂₀ ^(2,0,G)=5, B₉₈₂₀ ^(2,0,B)=9, . . . , B₉₈₂₀ ^(319,239,B)=176, p₉₈₂₀ ^(0,0,R)=6, p₉₈₂₀ ^(0,0,G)=6, p₉₈₂₀ ^(0,0,B)=6, p₉₈₂₀ ^(1,0,R)=7, p₉₈₂₀ ^(1,0,G)=6, p₉₈₂₀ ^(1,0,B)=6, p₉₈₂₀ ^(2,0,R)=12, p₉₈₂₀ ^(2,0,G)=8, p₉₈₂₀ ^(2,0,B)=11, . . . , p₉₈₂₀ ^(319,239,B)=6.

Calculate difference frame D with formula (5) and receive D₉₈₂₀ ^(0,0,R)=1, D₉₈₂₀ ^(0,0,G)=2, D₉₈₂₀ ^(0,0,B)=2, D₉₈₂₀ ^(1,0,R)=0, D₉₈₂₀ ^(1,0,G)=1, D₉₈₂₀ ^(1,0,B)=2, D₉₈₂₀ ^(2,0,R)=1, D₉₈₂₀ ^(2,0,G)=2, D₉₈₂₀ ^(2,0,B)=3, . . . , D₉₈₂₀ ^(319,239,B)=0.

Perform its binarization using threshold frame: γ₉₈₂₀ ^(0,0,R)=0, γ₉₈₂₀ ^(0,0,G)=0, γ₉₈₂₀ ^(0,0,B)=0, γ₉₈₂₀ ^(1,0,R)=0, γ₉₈₂₀ ^(1,0,G)=0, γ₉₈₂₀ ^(1,0,B)=0, γ₉₈₂₀ ^(2,0,R)=0, γ₉₈₂₀ ^(2,0,G)=0, γ₉₈₂₀ ^(2,0,B)=0, . . . , γ₉₈₂₀ ^(319,239,B)=0.

Merge color channels based on OR-rule and receive: m₉₈₂₀ ^(0,0)=0, m₉₈₂₀ ^(1,0)=0, m₉₈₂₀ ^(2,1)=0, m₉₈₂₀ ^(3,0)=0, m₉₈₂₀ ^(4,0)=0, m₉₈₂₀ ^(5,0)=0, m₉₈₂₀ ^(6,0)=0, m₉₈₂₀ ^(7,0)=0, m₉₈₂₀ ^(8,0)=0, . . . , m₉₈₂₀ ^(319,239)=0.

Label non-zero connected areas and receive the number of areas=837.

Create pre-detection zones by generating circumscribing rectangles:

Dz₀={115, 19, 116, 22}, . . . , Dz₈₃₆={4, 163, 12, 167}, where the coordinates can be found in the following order: (x-left border, y-upper border, x-right border, y-bottom border).

Calculate the size of each zone in meters and receive: width of Dz₀=1.38 m, height of Dz₀=2.66 m, . . . , width of Dz₈₃₆=0.36 m, height of Dz₈₃₆=0.29 m.

Reject the pre-detection zone due to the size and receive 78 zones.

Pre-detection zones in binary frame are divided into sections. Receive coordinates of 109 sections: Sz₀=(115, 21, 115, 21}, . . . , Sz₁₀₈=(4, 163, 12, 167).

Form new tracking zones from the sections. For the first section, on a position closest to the bottom central point of frame, that is, section with coordinates {100, 135, 104, 165}, a tracking zone satisfying predefined metric height of 0.8 m and width of 0.2 m is created. Get tracking zone with coordinates {100, 152, 104, 165}. Appropriate sections are excluded from further processing.

Then, for each of the remaining sections calculate the area of overlapping with the tracking zone. For the section of coordinates {100, 135, 104, 165} receive overlapping area of 155. Square of the section is 155. Since the ratio of this area to the area of section exceeds assigned threshold 10%, the section is attached to this tracking zone.

Other sections are not attached to this tracking zone, since there is no intersection.

Procedure is repeated as long as remaining non-processed sections.

They calculate the ratio of the summary area of the sections, attached to each zone of tracking, to the summary area of the overlap of sections with the zone of tracking. For the tracking zone with the coordinates {100, 152, 104, 165} obtain the value of 1. Since this value is more than the specified threshold, the tracking zone is considered reliable.

As a result, obtain 37 tracking zones with coordinates: {108, 149, 112, 162}, {139, 69, 140, 76}, . . . , {2, 26, 2, 27}.

Make comparison of tracking zones with the background frame by calculating normalized correlation and obtain the zone with the coordinates {116, 21, 116, 22}, that similarity does not exceed the threshold of 60%, so retain the zone for tracking, but rejects all other zones as their similarity exceeds 60% with the background.

After creating of the new tracking zones, get one zone with the coordinates {116, 21, 116, 22}.

Perform searching tracking zones formed on the previous frames in the current frame. Zones, whom value of similarity function during any displacement exceeds the specified threshold, 70% (i.e. is found correspondence), are added to the list of the new tracking zones. Zones, in which correspondence is not found, are rejected.

Reveal correspondence for 24 of tracking zones and add them to the list of the new tracking zones.

Thus, obtain 25 of tracking zones with the coordinates: Tr₀={35, 132, 39, 145}, Tr₁={35, 125, 39, 138}, . . . , Tr₂₄={116, 21, 116, 22}.

Merge tracking zones wherein the distance between the boundaries of which less than 5 pixels into one cluster. In this case form the four clusters: C1₀={30, 125, 39, 159}, C1₁{96, 125, 109, 166}, C1₂={63, 116, 63, 157}, C1₃={116, 21, 116, 22}.

Calculate metric size of clusters using calibrated cameras. Obtain: for C1₀ cluster width of 0.21 m, height of 0.79 m, for C1₁ cluster width of 0.20 m, height of 0.78 m, for C1₂ cluster width of 0.38 m, height of 1.89 m, for C1₃ cluster width of 0.54 m, height of 1.06 m.

Perform rejection by width w_(min), w_(max), height h_(min), h_(max) of clusters. Cluster passes rejection, so continue to further processing. Since the ignoring zones are not assigned, the rejection on the position of them is not performed.

Lifetime of cluster is defined as the number of frames, during which the cluster is detected. To calculate lifetime, we make the comparison of the remaining clusters with the clusters been saved in previous frames.

For the cluster C1₀ the device finds that cluster which is close to the coordinates was not detected before. Therefore, the device set life-time=1 to the cluster. Process is finished at the current frame, but we preserve the cluster for the following frames.

For the cluster C1₁ the device finds that a cluster was already found with the coordinates {95,126,108,166} and makes a decision that this is the same cluster. The device copies the life-time of cluster increasing by one from it and obtains 26. Since this value does not exceed a specified threshold of 40, the device finishes processing in this frame, but preserves the cluster for the following frames.

For the cluster C1₂ the device finds that a cluster was already found with the coordinates {53, 116, 63, 157} and makes a decision that this is the same cluster. The device copies the life-time of cluster increasing by one from it and obtains 41. Since this value exceeds the threshold of 40, adopt the coordinates of the cluster as the coordinates of object located in the frame.

For the cluster C1₃ the device finds cluster which is close on the coordinates and not discovered before. Therefore, the device set life-time=1 to the cluster. Processing in this frame is finished, but the cluster is preserved for the following frames.

Thus, make a decision about the detection on the current frame of object with the screen coordinates {63, 116, 63, 157}, with metric size of 0.38 on 1.89 m.

The device makes a similar calculation for each of the next frame.

One of the technical result of the embodiment is the reducing the number of false responses regardless of interference with the dynamic noises of varying intensity.

While the present invention has been described with reference to the particular illustrative embodiments, it is not to be restricted by those embodiments but only by the appended claims. It is to be appreciated that those skilled in the art can change or modify the embodiments without departing from the scope and spirit of the present invention. 

1. A method of detecting objects, including receiving frames of video, initializing of a background with its subsequent correction, calculation of a difference between a current frame and the background, binarization with a threshold frame, spatial filtering, forming zones, and describing a position of objects, comprising the steps of: determining a constant of updating background ρ in each pixel, depending on the detection of an object under a rule: ρ=ρ₁, if the pixel is classified as the background, ρ=k*ρ₁, if the pixel is classified as the object, where 0<ρ₁<1, k is a first factor, 0<=k<=1; calculating the threshold frame by a formula p_(i)=k₁ ²σ_(i) ², where k₁ is a second factor, and a variance σ_(i) is updated by a formula σ_(i) ²=(1−ρ)σ_(i−1) ²+ρ(ρ_(i)−I_(i))², where I_(i) is a current frame, μ_(i) is a current background frame, and i is a number frame; and performing spatial-filtering by shaping the zones detected preliminary in the binarized image.
 2. The method according to claim 1, further comprising the steps of: after the step of performing the spatial-filtering: filtering out some of the zones; dividing the rest of the zones into sections; forming tracking zones; filtering out a part of the tracking zones; forming clusters of the tracking zones by combining the zones located close to each other, calculating cluster metric sizes, coordinates and life-time; applying rejection of clusters; and accepting the coordinates of remaining clusters as the coordinates of objects, located in the frame. 